[tune] Change the log syncing behavior by hartikainen · Pull Request #4450 · ray-project/ray

hartikainen · 2019-03-21T22:58:19Z

What do these changes do?

Refactor the log sync behavior.

TODOs:

Related issue number

@richardliaw

AmplabJenkins · 2019-03-21T22:58:39Z

Can one of the admins verify this patch?

AmplabJenkins · 2019-03-21T23:39:23Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/13169/
Test FAILed.

AmplabJenkins · 2019-04-04T20:42:07Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/13545/
Test FAILed.

AmplabJenkins · 2019-04-05T00:10:02Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/13575/
Test FAILed.

AmplabJenkins · 2019-04-05T00:10:04Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/13576/
Test FAILed.

AmplabJenkins · 2019-04-30T23:18:22Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14020/
Test FAILed.

AmplabJenkins · 2019-05-01T00:41:48Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14021/
Test FAILed.

AmplabJenkins · 2019-05-16T21:22:37Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14254/
Test FAILed.

AmplabJenkins · 2019-05-16T22:34:24Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14260/
Test FAILed.

AmplabJenkins · 2019-05-16T22:57:05Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14262/
Test FAILed.

AmplabJenkins · 2019-05-16T23:03:23Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14264/
Test FAILed.

AmplabJenkins · 2019-05-17T01:03:57Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14274/
Test FAILed.

AmplabJenkins · 2019-07-01T19:18:50Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15004/
Test PASSed.

AmplabJenkins · 2019-07-01T19:27:15Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15003/
Test PASSed.

hartikainen · 2019-07-02T03:18:03Z

doc/source/tune-usage.rst

 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-Tune automatically persists the progress of your experiments, so if an experiment crashes or is otherwise cancelled, it can be resumed with ``resume=True``. The default setting of ``resume=False`` creates a new experiment, and ``resume="prompt"`` will cause Tune to prompt you for whether you want to resume. You can always force a new experiment to be created by changing the experiment name.
+Tune automatically persists the progress of your experiments, so if an experiment crashes or is otherwise cancelled, it can be resumed by passing one of True, False, "LOCAL", "REMOTE", or "PROMPT" to ``tune.run(resume=...)``. The default setting of ``resume=False`` creates a new experiment. ``resume="LOCAL"`` and ``resume=True`` restore the experiment from ``local_dir/[experiment_name]``. ``resume="REMOTE"`` syncs the upload dir down to the local dir and then restore the experiment from ``local_dir/experiment_name``. ``resume="PROMPT"`` will cause Tune to prompt you for whether you want to resume. You can always force a new experiment to be created by changing the experiment name.


Suggested change

Tune automatically persists the progress of your experiments, so if an experiment crashes or is otherwise cancelled, it can be resumed by passing one of True, False, "LOCAL", "REMOTE", or "PROMPT" to ``tune.run(resume=...)``. The default setting of ``resume=False`` creates a new experiment. ``resume="LOCAL"`` and ``resume=True`` restore the experiment from ``local_dir/[experiment_name]``. ``resume="REMOTE"`` syncs the upload dir down to the local dir and then restore the experiment from ``local_dir/experiment_name``. ``resume="PROMPT"`` will cause Tune to prompt you for whether you want to resume. You can always force a new experiment to be created by changing the experiment name.

Tune automatically persists the progress of your experiments, so if an experiment crashes or is otherwise cancelled, it can be resumed by passing one of True, False, "LOCAL", "REMOTE", or "PROMPT" to ``tune.run(resume=...)``. The default setting of ``resume=False`` creates a new experiment. ``resume="LOCAL"`` and ``resume=True`` restore the experiment from ``local_dir/[experiment_name]``. ``resume="REMOTE"`` syncs the upload dir down to the local dir and then restores the experiment from ``local_dir/experiment_name``. ``resume="PROMPT"`` will cause Tune to prompt you for whether you want to resume. You can always force a new experiment to be created by changing the experiment name.

hartikainen · 2019-07-02T05:28:12Z

Looks good to me!

AmplabJenkins · 2019-07-02T11:59:32Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15023/
Test PASSed.

richardliaw · 2019-07-02T19:05:47Z

Awesome!

richardliaw · 2019-07-02T19:07:52Z

python/ray/tune/tune.py

-        else:
-            logger.info("Tip: to resume incomplete experiments, "
-                        "pass resume='prompt' or resume=True to run()")
+def _get_resume_path(local_checkpoint_dir, remote_checkpoint_dir):


this is extraneous

richardliaw · 2019-07-02T19:09:17Z

python/ray/tune/trial_runner.py

+    def _validate_resume(self, resume_type):
+        """
+        Args:
+            resume_type: One of "REMOTE", "LOCAL", "PROMPT".


Suggested change

resume_type: One of "REMOTE", "LOCAL", "PROMPT".

resume_type: One of "REMOTE", "LOCAL", True, "PROMPT".

richardliaw · 2019-07-02T19:09:49Z

python/ray/tune/trial_runner.py

-        self._metadata_checkpoint_dir = metadata_checkpoint_dir
+        self._local_checkpoint_dir = local_checkpoint_dir
+
+        # TODO(rliaw): This may fail


Suggested change

# TODO(rliaw): This may fail

richardliaw · 2019-07-02T19:12:28Z

python/ray/tune/syncer.py

+    Args:
+        local_dir: Source directory for syncing.
+        remote_dir: Target directory for syncing. If None,
+            returns NoopSyncer.


Suggested change

returns NoopSyncer.

returns BaseSyncer with a noop.

…into bunch-of-log-sync-fixes

AmplabJenkins · 2019-07-02T21:58:58Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15030/
Test PASSed.

AmplabJenkins · 2019-07-02T22:21:27Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15032/
Test FAILed.

AmplabJenkins · 2019-07-03T01:45:26Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15040/
Test FAILed.

Change the log syncing behavior

6527abf

Merge branch 'master' into bunch-of-log-sync-fixes

e760991

richardliaw added 3 commits April 4, 2019 15:03

fix up abstractions for syncer

2ada6db

Finished checkpoint syncing

26fe09b

Code

f45110d

Set of changes to get things running

ee5c61d

Fixes for log syncing

045bfa4

richardliaw self-assigned this May 2, 2019

richardliaw mentioned this pull request May 10, 2019

[tune] tf.summary.FileWriter extensibility for custom TensorBoard metrics #4762

Closed

Merge branch 'master' into bunch-of-log-sync-fixes

c5b1731

richardliaw added 4 commits May 16, 2019 15:03

Fix parts

7d7ced1

Merge branch 'tune-submit-fix' into bunch-of-log-sync-fixes

5ce47d7

Lint and other fixes

979a04c

fix some test

91dad93

richardliaw added 2 commits May 16, 2019 15:40

Remove extra parsing functionality

e3ecc72

Merge branch 'tune-relax-configs' into bunch-of-log-sync-fixes

26a538f

richardliaw added 2 commits May 16, 2019 17:23

some test fixes

b0f6218

Fix up cloud syncing

5ca8eca

richardliaw added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Jul 1, 2019

richardliaw added 2 commits July 1, 2019 09:25

Update test_cluster.py

fdebff3

betterdoc

cd135d7

hartikainen commented Jul 2, 2019

View reviewed changes

Update tune-usage.rst

652050f

richardliaw approved these changes Jul 2, 2019

View reviewed changes

richardliaw reviewed Jul 2, 2019

View reviewed changes

richardliaw added 4 commits July 2, 2019 12:19

cleanup

94e3cac

Merge branch 'bunch-of-log-sync-fixes' of github.com:hartikainen/ray …

6243211

…into bunch-of-log-sync-fixes

Merge branch 'master' into bunch-of-log-sync-fixes

d3cefa8

nit

f825ec0

richardliaw merged commit 9e0192b into ray-project:master Jul 3, 2019

hartikainen deleted the bunch-of-log-sync-fixes branch July 3, 2019 04:36

lscheinkman added a commit to lscheinkman/nupic.research that referenced this pull request Jan 8, 2021

Fix ray.tune api change. See ray-project/ray#4450

3699497

lscheinkman mentioned this pull request Jan 8, 2021

Fix ray.tune api change numenta/nupic.research#432

Merged

	resume_type: One of "REMOTE", "LOCAL", "PROMPT".
	resume_type: One of "REMOTE", "LOCAL", True, "PROMPT".

Conversation

hartikainen commented Mar 21, 2019 • edited by richardliaw Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What do these changes do?

TODOs:

Related issue number

Uh oh!

AmplabJenkins commented Mar 21, 2019

Uh oh!

AmplabJenkins commented Mar 21, 2019

Uh oh!

AmplabJenkins commented Apr 4, 2019

Uh oh!

AmplabJenkins commented Apr 5, 2019

Uh oh!

AmplabJenkins commented Apr 5, 2019

Uh oh!

AmplabJenkins commented Apr 30, 2019

Uh oh!

AmplabJenkins commented May 1, 2019

Uh oh!

AmplabJenkins commented May 16, 2019

Uh oh!

AmplabJenkins commented May 16, 2019

Uh oh!

AmplabJenkins commented May 16, 2019

Uh oh!

AmplabJenkins commented May 16, 2019

Uh oh!

AmplabJenkins commented May 17, 2019

Uh oh!

AmplabJenkins commented Jul 1, 2019

Uh oh!

AmplabJenkins commented Jul 1, 2019

Uh oh!

hartikainen Jul 2, 2019

Choose a reason for hiding this comment

Uh oh!

richardliaw Jul 2, 2019

Choose a reason for hiding this comment

Uh oh!

hartikainen commented Jul 2, 2019

Uh oh!

AmplabJenkins commented Jul 2, 2019

Uh oh!

richardliaw commented Jul 2, 2019

Uh oh!

richardliaw Jul 2, 2019

Choose a reason for hiding this comment

Uh oh!

richardliaw Jul 2, 2019

Choose a reason for hiding this comment

Uh oh!

richardliaw Jul 2, 2019

Choose a reason for hiding this comment

Uh oh!

richardliaw Jul 2, 2019

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Jul 2, 2019

Uh oh!

AmplabJenkins commented Jul 2, 2019

Uh oh!

AmplabJenkins commented Jul 3, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hartikainen commented Mar 21, 2019 •

edited by richardliaw

Loading